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Helicobacter pylori causes human gastroduodenal diseases, including chronic gastritis and peptic ulcer disease. It is also a major 
microbial risk factor for the development of gastric adenocarcinoma and mucosa-associated lymphoid tissue (MALT) lym- 
phoma. Twenty-one strains with different ethnicity, disease, and antimicrobial susceptibility backgrounds were sequenced by 
use of Illumina HiSeq and PacBio RS platforms. 
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Clinical outcomes induced by Helicobacter pylori vary with in- 
dividuals ( 1 ), especially among those with different ethnic or- 
igins (2). In fact, this is a hallmark of the Gram-negative curved 
bacterium that resides in the human stomach. Hypothetically, this 
variation could be due to the diversity of pathogenic genes present 
in the H. pylori strains infecting different ethnic groups (1), the 
geographically distinct DNA polymorphisms of H. pylori (2,3), or 
the lifestyles of people of different ethnic groups (4). This phe- 
nomenon is not exceptional in multiracial Malaysia (5). Together 
with 10 previously announced genome sequences (6), which have 
been reassembled using newer algorithms, we present here 21 ge- 
nome sequences of H. pylori strains isolated from patients with 
different ethnicities, disease statuses, and antimicrobial suscepti- 
bility patterns who were attending the endoscopy unit at the Uni- 
versity of Malaya Medical Center (UMMC) (Table 1). 

H. pylori DNA was isolated using the RTP bacteria DNA mini- 
kit (Invitek GmbH, Berlin, Germany). Whole-genome sequenc- 
ing was performed using 100-base, paired-end reads on the Illu- 
mina HiSeq 2000 instrument (Illumina, Inc., San Diego, CA) at 
the Malaysian Genomics Resource Centre Berhad (MGRC) 
(Kuala Lumpur, Malaysia). Assemblies were performed for each 
sample at optimal fc-mers using the ABySS assembler version 1.3.4 
(7). Assembled contigs were scaffolded with SSPACE using 
paired-end read information from each sample (8). Gene se- 
quences were predicted from assembled scaffolds using 
GeneMark version 2.5 (9). Predicted gene sequences were 
aligned against the UniProt (Swiss-Prot/TrEMBL) database us- 
ing SynaSearch (MGRC) for annotation. 

Two strains, UM037 and UM066, were also sequenced using 



Pacific Biosciences RS sequencing technology (Pacific Biosci- 
ences, Menlo Park, CA), yielding >20X average genome cover- 
age. Each sample was prepared as a 10-kb insert library using C2 
chemistry and sequenced on 8 single-molecule real-time (SMRT) 
cells. De novo assembly of the read sequences was carried out using 
continuous long reads (CLR) following the Hierarchical Genome 
Assembly Process (HGAP) workflow (PacBio DevNet; Pacific 
Biosciences) as available in SMRT Analysis v2.0. The genomes 
were annotated with the NCBI (National Center for Biotechnol- 
ogy Information) Prokaryotic Genomes Automatic Annotation 
Pipeline. Using the PacBio workflow, the H. pylori UM037 and 
UM066 genome sequences were assembled as single contigs of 
1,692,823 bp and 1,660,425 bp, respectively. The NCBI annota- 
tion predicted 1,677 and 1,637 open reading frames (ORFs) (1,637 
and 1,595 annotated genes) for UM037 and UM066, respectively. 

The availability of these H. pylori genome sequences from in- 
dividuals from different ethnic backgrounds with distinct clinical 
presentations provides the research community with a resource 
for detailed investigations into the genetic elements that correlate 
with bacterial evolution, compensatory mechanisms, host adap- 
tation, gastric pathogenesis, and selective pressure exerted by an- 
timicrobial agents. Furthermore, these sequencing data sets also 
enable the comparison of Illumina HiSeq and PacBio RS sequenc- 
ing platforms for H. pylori genomes. 

Nucleotide sequence accession numbers. The H. pylori ge- 
nome sequences described in this paper have been deposited as 
draft whole-genome shotgun projects in DDBj/EMBL/GenBank 
under the accession numbers stated in Table 1. The versions de- 
scribed in this paper are the first versions. 
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TABLE 1 Sequencing statistics, genome information, strain characteristics, and accession numbers for 21 H. pylori strains 17 
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" Illumina HiSeq 2000 (draft whole-genome sequence). 
b PacBio SMRT (complete genome sequence). 

c PID, percent identity; CH, clarithromycin resistant; FQ, fluoroquinolone resistant; GC, gastric cancer; MZ, metronidazole resistant; NUD, nonulcer dyspepsia; PUD, peptic ulcer 
disease. 
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